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Abstract. This papers contains two results concerning random nxn Bernoulli 
matrices. First, we show that with probability tending to one the determinant 
has absolute value \AiIexp(0(\/ n In n)). Next, we prove a new upper bound 
.939" on the probability that the matrix is singular. 



1. Introduction 

Let 71 be a large integer parameter, and let M„ denote a random nxn ±1 matrix 
("random" meaning with respect to the uniform distribution, i.e., the entries of Af„ 
are i.i.d. Bernoulli random variables). Throughout the paper, we assume that n is 
sufhciently large, whenever needed. We use o(l) to denote any quantity which goes 
to zero as n — > oo, keeping other parameters (such as e) fixed. 

This model of random matrices is of considerable interest in many areas, including 
combinatorics, theoretical computer science and mathematical physics. On the 
other hand, many basic questions concerning this model have been open for a long 
time. In this paper, we focus on the following two questions: 

Question 1. What is the typical value of the determinant of M„ ? 
Question 2. What is the probability that M„ is singular ? 

Let us first discuss Question 1. From Hadamard's inequality, we have the bound 
I det(M„) I < n"/^, with equality if and only if M„ is an Hadamard matrix. However, 
in general we expect |det(M„)| to be somewhat smaller than n"/^. Indeed, from 
the simple estimate^ 

E((detM„)2) = n! (1) 

(first observed by Turan [13]), one is led to conjecture that | det(M„)| should be of 
the order of Vn\ = e~"-/2+°(")ri"/2 with high probability. On the other hand, even 
proving that | det M„ | is typically positive (or equivalently, that M„ is typically 
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^Indeed, one can prove (1) by expanding det M„ as the sum of n! signs, and observing that all 
the covariances vanish. 
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non-singular) is already a non-trivial task. This task was first done by Komlos [7] 
(see Theorem 1.2 below). 

The first main result of this paper shows that with probability tending to one (as 
n tends to infinity), the absolute value of the determinant is very close to Vnl- 

Theorem 1.1. 

P(|detM„| > V^exp(-29n^/2ln^/2n)) = 1 - o(l). 
The constant 29 is generous but we do not try to optimize it. 

Note that from (1) and Chebyshev's inequality that 

P(| det M„| < uJ{n)^/n\)) = 1 - o(l) 

for any function uj{n) which goes to infinity as n — > oo. Combining this with the 

preceding theorem and the observation that det M„ is symmetric around the origin, 
it follows that for each sign ±, we have the concentration inequality. 

det(M„) = ±Vn! exp(0(ni/2 \n^'^ n)) 

with probability 1/2 — o(l). 

Let us now turn to the problem of determining the probability that M„ is singular. 
As mentioned above, Komlos showed, in 1967, that 

Theorem 1.2. [7] P(detM„ = 0) = o(l). 

The task here is to give a precise formula for o(l) in the right hand side. Since 
a matrix M„ with two identical (or opposite) rows or two identical (or opposite) 
columns is necessarily singular, it is easy to see that 

P(detM„ = 0) > (1 - o(l))n^2^-". 

It has often been conjectured (see e.g. [10], [6]) that this is the dominant source of 
singularity. More precisely. 

Conjecture 1.3. 

P(det M„ = 0) = (1 - o(l))n22i-". 

Prior to this paper, the best partial result concerning this conjecture is the following, 
due to Kahn, Komlos and Szemeredi [6] : 

Theorem 1.4. [6] We have P(detM„ = 0) < (1 - e + o(l))", where e := .001. 



Our second main result is the following improvement of this theorem: 
Theorem 1.5. We have P(detM„ = 0) < (1 - e -|- o(l))", where e := .06191 . 
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This value of e is the unique solution in the interval (0, 1/2) to the equation 

e 

log2 16/15 



M£) + r-47T^ = l. (2) 



where h is the entropy function 

h{e):=e logj ^ + (1 - e) logs y~ ' 

We prove Theorem 1.5 in Sections 5-7. Our argument uses several key ideas from 
the original proof of Theorem 1.4 in [6], but invoked in a simpler and more direct 
fashion. In a sequel to this paper [11] we shall use more complicated arguments to 
improve this value of e further, to e = j. 

This paper is organized as follows. In Section 2 we establish some basic estimates 
for the distance between a randomly selected point on the unit cube { — 1, 1}" and 
a fixed subspace, and in Section 3 we obtain similar types of estimates in the case 
when the subspace is also random. In Section 4 we then apply those estimates to 
prove Theorem 1.1. As a by-product, we also obtain a short proof of Theorem 1.2. 
We then give the proof of Theorem 1.5 in Sections 5-7. 

In this paper we shall try to emphasize simplicity. Several results obtained in 
these parts can be extended or refined considerably with more technical argu- 
ments. In last part of the paper (Section 8), we will consider some of these ex- 
tensions/refinements. In particular, we prove an extension of Theorem 1.2 and 
Theorem 1.1 for more general models of random matrices. 



2. The distance between a random vector and a deterministic 

subspace 



Let X he a random vector chosen uniformly at random from {—1, 1}", thus X = 
(ei, . . . , e„) where ei, . . . , e„ are i.i.d. Bernoulli signs. Let be a (deterministic) 
d-dimensional subspace of R" for some < < n. In this section we collect a 
number of estimates concerning the distribution of the distance dist(X, W) from X 
to W , which we will then combine to prove Theorem 1.2 and Theorem 1.1. 

We have the crude estimate 

< dist(X, W) < dist(X,0) = s/n; 

later we shall see that dist(X, W) is in fact concentrated around \/n — d (see Lemma 
2.2). 

We next recall a simple observation of Odlyzko. 
Lemma 2.1. [10] P(dist(X,iy) = 0) < 2<^-". 

Proof Since W has dimension d in R", there is a set of d coordinates which 
determines all other n — d coordinates of an element of W. But the corresponding 
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n — d coordinates of X are distributed uniformly in {—1, 1}"^'' (thinking of the 
other k coordinates of X as fixed). Thus the constraint d.\st{X,W) = can only 
be obeyed with probability at most 2'^"", as desired. ■ 

For a variant of Lemma 2.1 which gives a lower bound on dist(X, W) with high 
probability, see, Lemma 8.10. Next, we establish that dist(X, W) concentrates near 
\/n — d. 

Lemma 2.2. Let W he a fixed subspace of dimension 1 < < n — 4 and X a 

random ±1 vector. Then 

E(dist(X, Wf) =n-d. (4) 

Furthermore, for any t > 

P(|dist(X, W) - ^n-d\ >t + 2)< 4 exp{-t^ / 16) . (5) 

Proof Let P = {Pjk)i<j,k<n be the n x n orthogonal projection matrix from 
R" to W. Let D = diag(pii, . . . be the diagonal component of P, and let 

A := P — D = {ajk)i<j,k<n be the off-diagonal component of P. Since P is an 
orthogonal projection matrix, we see that A is real symmetric with zero diagonal. 
If we write X = (ei, . . . , e„), then from Pythagoras's theorem we have 

dist(X,W^)2 = \X\^ - \PX\^ 

n 71 
3 = 1 fe=i 

n n 

= n- tr(P) - X] X] "^i^kajk 

3 = 1 k=l 
n n 

= n - - ^ ^ CjCkajk- 

j=i k=i 

This already gives (4), since ajk vanishes on the diagonal. Set Y = J2^=i J2k=i ^j^k^jk- 
It is easy to see that 

E(r2) = 2 J2 «,\ = 2tr(A2). 

l<j,fc<n 

Observe that as P is a projection matrix, the coefficients pjk are bounded in mag- 
nitude by 1, and we have 

n n 

Y^Y.p%=t,{P') = tr{P)=d. 
j=i k=i 

On the other hand 



n 

Y^Pn = tr(P) = d 
j=i 
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SO by Cauchy-Schwartz 



This implies that 



n n n 

tr(A2) = J2J2p% - Y,P% <d- (f/n < mm{d, n - d}. 
j=i k=i j=i 

Consider the event dist(X, W) > \/n — d + 2. This probabiHty of this event is 
bounded from above by 



P(dist^(X, W)>{n-d)+ AV^i^^) = P{Y > iV^i^^) < P(y2 > i6(n - d)). 
By Markov's inequahty 



P(y->16(n-.))< 5^\< ^^^-'^^ 



16{n-d) - 16(n-d) 8' 

which impUcs that the median M of dist(X, W) is at most ^/n — d+2. To boimd M 
from below, consider the event dist(X, W) > \/n — d — 2. By a similar argument, 
the probability of this event is at most 



P(y < -^\fn^ + 4) < P(y2 > 16(n - d) - 32Vri^ + 16). 
By Markov's inequality, the last probability is at most 



2(n -d) 1 
16(n -d)- 32Vn^ + 16 2' 

for all d < n — 4. Thus, we can conclude that \M — y/n — d\ < 2. 

Since dist(X, is a convex function on {—1,1}" with Lipschitz coefficient 1, 
Talagrand's [12] inequality implies that 



P(|dist(X,W) -M| >t) <4exp(-tVl6), 



for any t > 0. Since \M — y/n — d\ < 2, Lemma 2.2 follows. 
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Remarks 2.3. One can deduce a concentration result similar to Lemma 2.2 using 
the high moment method; there is also a slightly weaker statement that can be 
obtained from Bonami's inequality [2]. 

One can have a similar statement for the case d^n — 3,n — 2 and n — 1. In these 
cases \/n — d < 2, so the event dist(X, W) < \/n — d — 2 holds with probability 
zero. So the median M is between and 3. Therefore, in these cases 



3. The distance between a random vector and a random subspace 

The estimates in the last section are quite accurate when n — dis sufRciently large, 
but do not provide much useful information when n — d is small (e.g., n — d = 2). 
For instance, it does not show that the distance is (with high probability) not zero 
in this case. Indeed, there are some exceptional spaces W (e.g. the hyperplane of 
points (xi, . . . ,Xn) with xi = X2) which capture a very large fraction of the points 
in {—1, 1}". However, in our applications TV is a subspace spanned by random 
vectors and will thus "typically" not be of the exceptional form described above, in 
which the unit normal contains many zero coordinates. In such a case we can still 
recover good lower bounds on dist(X, W) with high probability. More precisely, we 
have 

Lemma 3.1. Let X be a random vector in {—1, 1}", let 1 < d < n — 1 and W a 
space spanned by d random vectors in {—1, 1}", chosen independently of each other 
and with X. Then we have 



Remark 3.2. In fact, as W is spanned by random vectors, we can fix X. The above 
formulation is, however, more convenient for the proof. 

The remainder of this section will be devoted to the proof of Lemma 3.1. 

Let 1 < I < n. We say that W is l-typical if any unit vector {wi, . . . ,ui„) G W-^ 
has at least I coordinates whose absolute values are at least In order to prove 
Lemma 3.1, wc need the following 

Lemma 3.3 (dist(X, W) is large for typical W). Let W be a (deterministic) sub- 
space which is l-typical for some 1 < I < n. Then 



Proof By hypothesis and symmetry, we may assume without loss of generality 
that there is a unit normal {wi, . . . , Wn) S W-^ such that ... ,\wi\ > We 



P(dist(X,M^) >3 + t)< 4exp(-tVl6). 




■)• 



P(dist(X,VF) <-r)< 
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then see that 

P(clist(X, W)<^)= P{\e,w, + ... + enWnl < ^) 

< sup P{\eiWi + . . . + eiwi - x\ < 

= sup P(ei2n«;i + . . . + ei2nwi e[y,y + 1]) 

where we have made the substitutions x := X];<j"<n ^j'^^j ^^"^ V •= 2n.x — ^ respee- 
tively. To conclude the claim, we invoke the following variant of the Littlewood- 
Offord lemma, due to Erdos [3]: 

Lemma 3.4. [3] Let ai,...,ak be real numbers with absolute values larger than 
one. Then for any interval I of length at most one 



k 

P(^aieiGl)^0{l/Vk). 

i=l 

This lemma was proved by Erdos using Sperner's lemma. The reader may want to 
check Remark 7.2 for a different argument. Lemma 3.3 immediately follows. ■ 



We are now ready to prove Lemma 3.1. 

Proof It suffices to prove the extremal case when W is spanned by n — 1 random 
vectors. Set I := . In light of Lemma 3.3, we see that it suffices to show that 

P(W is not /-typical) = 0(l/\/inn). (6) 

If W is not /-typical, then there exists a unit vector w orthogonal to W with at 
least n — I coordinates which are less than ^ in magnitude. There are („" ;) = (7) 
such possibilities for these coordinates. Thus by symmetry we have 

P{W is not /-typical) < ^^^P(W _L w for some w & ^) 

where is the space of all unit vectors w = (wi, . . . , w„) such that \wj\ < ^ for 
all I < j < n. 

Suppose that w G ft was such that W -L w, then _L w for all 1 < i < n — 1. 
Write Xi = (e,,!, . . . , ei,„), then 

n 

Since Cij = ±1, and \wj\ < l/2n for j > I, we thus conclude from the triangle 
inequality that 
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On the other hand, we have 

n 

= 1- E K-i' 

3=1+1 

>l-(n-0(^f 

>1-1 
n 

Comparing these two inequahties, we see that (for n > 8; the cases n < 8 are 

of course trivial) that for each l<i<n— 1, at least one of the ei.jWj has to 
be negative. Thus, if we let ei, . . . ,e( be signs such that ejWj is positive for all 
1 < i < ^ we thus have 

(ei,j)i<j<i ^ {^j)i<j<i for all 1 < i < n - 1. 

Thus we have 



P(W^ _L w for some w € O) 

< E ^{{^i,j)i<j<i 7^ fe)i<j<! for all 1 < i < n - 1). 

ei,...,€ie{-l,l} 

Since the Ci^j are i.i.d. Bernoulli variables, we have 

P((ei,j)i<j<i ^ {^i)i<i<i for all 1 < i < n - 1) = (1 - 2-')^-\ 
Putting this all together, we obtain 

P(W" is not I - typical) < {^^ 2\l - 2"')"-^ 
< n'+i2'e-2'("-i), 

and (6) follows by choice of I. This proves Lemma 3.1. ■ 

As a consequence of this lemma, we derive a short proof of Theorem 1.2. Let 
X\,..., Xn be the row vectors of M„ and Wj be the subspace spanned by Xi, . . . ,Xj. 
Observe that if M„ is singular, then Xi, . . . ,X„ are linearly dependent, and thus 
we have dist(Xj+i, Wj) = for some 1 < j < n — 1. Thus we have 

ri-l 

P(det(M„) = 0) < ^ P{dist{Xj+i, Wj) = 0) 

n-l 

= ^P(dist(X,W^,) = 0). 
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From Lemma 2.1 we have P(dist(X, Wj) = 0) < 2^"". Since P(dist(X, Wj) = 0) is 
clearly monotone increasing in j, we obtain the inequality 

P(det(M„) = 0) < 2"'= + fcP(dist(X, W„_i) = 0) 

for any 1 < A: < n. By the lemma just proved, P(dist(X, W„_i) = 0) = 0(l/\/lnn). 
By choosing k = In^^"' n 



completing the proof. 



2-'' + 0(A:/Vlnn) = o(l) 



4. Proof of Theorem 1.1 



For an n X n matrix A, \ det A\ is the volume of the parallelepiped spanned by the 
row vectors of A. If one instead expresses this volume in terms of base times height, 
we obtain the factorization 

|det(M„)|= [] dist(X,+i,W^,). 

0<i<n-l 

To estimate this quantity, we shall simply control each of the factors dist(Xj+i, Wj) 
separately, using the estimates obtained in the previous two sections. 

We may assume n is large. Set do = n — In^^^ n. For 1 < j < do 



7,- := 7. 



'ln(n — j) 
n-j 



It is trivial that all arc bounded from above by 1/2 if n is sufficiently large. 
Consider 1 < j < do- Assuming that Wj has dimension j, by Lemma 2.2 we have 
that the probability that the distance 



dist{Xj+i,Wj) < (1 - 7j) V^^^ 

is at most 

49 

4exp(-7|(n - j)/16) = 4exp(- — ln(n - j)) < {n - j)'^ 
provided that n — j is sufficiently large. This implies that with probability at least 



l-f^(.n-j)-' = l-o{l) 
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the distance dist{Xj+i, Wj) is at least {l — jj)y/n — j, for every 1 < j < do- (Notice 
that if dist{Xj+i, Wj) > then Wj+i has full dimension j + 1.) 

For do < j < n — 1, we are going to use Lemma 3.1 to estimate the distances. By 
this lemma, we have that with probability at least 



1- J2 0(^) = l-o(l) 



do<J<n 

the distance dist{Xj+i, Wj) is at least ^ for every do < j < n — 1. (In fact, the 
bound holds for all 1 < j < n — 1.) 

Combining the two estimates on distances, we see that with probability 1 — o(l). 



/ — I 1 



Since n — do = o(lnn), the error term . ^ (s?)" * "^^^^ exp(— o(ln^ n)). 

The main error term comes from the product njloC-*- ~ Ti)- definition of 

7j and the fact that all 7^ are less than 1/2, we have 



n(l-7i)>exp(-2^7,)>exp(-14^W^^i^). 



j=i j=i j=i 



n- J 



We use a rough estimate that 



Putting these together, we obtain, with probability 1 — o(l), that 



Yl dist{Xj+i,Wj) > V^exp(-28n^''2ln^/^n + o(ln2n)) 

0<j<n-l 

> \/riIexp(— n) 
proving the theorem. □ 



ON RANDOM ±1 MATRICES: SINGULARITY AND DETERMINANT 



11 



5. Proof of Theorem 1.5 

In this section, we denote N := 2". Our goal is to prove that P(detM„ = 0) < 
7V-(i+o(i))£^ where e is as in Theorem 1.5. 

Notice that if Af„ is singular, then Xi,. . . , X„ span a proper subspace V oiR". The 
first (fairly simple) observation is that we can restrict to the case V is a hyperplane, 
thanks to the following lemma: 

Lemma 5.1. [6] We have 

P(Xi, . . . , Xn linearly dependent) < N°^^^F{Xi, . . . , X„ span a hyperplane). 

Remark 5.2. One can replace A^°(i) by 1 + o(l), but this refinement has no signifi- 
cance in the current situation. 

Proof If Xi, . . . , Xn are linearly dependent, then there must exist < d < n — 1 
such that Xi, . . . , X^+i span a space of dimension exactly d. Since the number of 
possible d is at most n = N°^^\ it thus suffices to show that 

P(Xi, . . . , X4+1 span a space of dimension exactly d) 

< const X P{Xi, . . . , Xn span a hyperplane) 

for each fixed d. However, from Lemma 2.1 we see that 

P{Xi, . . . , Xci+2 span a space of dimension exactly d + 1 

\Xi, . . . , Xd+i span a space of dimension exactly d) > 1 — 2^^^", 

and so the claim follows from n — d — 1 applications of Bayes' identity. ■ 

In view of this lemma, it suffices to show 

^ P(Xi,... ,X„spany)<7V-^+°«. 
v,v hyperplane 

Clearly, we may restrict our attention to those hyperplanes V which are spanned 
by their intersection with { — 1, 1}". Let us call such hyperplanes non-trivial. Fur- 
thermore, we call a hyperplane H degenerate if there is a vector v orthogonal to H 
and at most log log n coordinates of v are non-zero. 

Fix a hyperplane V. Clearly we have 

P(Xi, . . . ,X„ span V) < P{X,, . . . ,X„ e F) = P(X e y)". (7) 

The contribution of the degenerate hyperplanes is negligible, thanks to the following 
easy lemma (cf. the proof of (6)): 

Lemma 5.3. The number of degenerate non-trivial hyperplanes is at most N°^^^ . 
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Proof If V is degenerate, then there is an integer normal vector v = {vi, . . . , w„) 
with at most log log n non-zero entries. There are X^fc<iog „ (^) < log log nn'°s " < 
]V°(^) possible places for the non-zero entries. By relabeling if necessary we may 
assume that it is vi, . . . ,Vk which are non-zero for some 1 < fc < log log n. Let 
TT : {—1,1}" — > {—1,1}'^ be the obvious projection map. Then V is then deter- 
mined by the projections {tt{Xi), . . . , 7r(X„)}, which are a subset of { — 1, l}*^. The 
number of such subsets is at most 2^*^ < 22'°*'°*" = ]Sf°^^\ and the claim follows^. 



By Lemma 2.1, P{X E V) is at most 1/2 for any hyperplane V, so the contribution 
of the degenerate non-trivial hyperplanes to P(det M„ = 0) is only iV~^+°(^\ 

Following [6], it will be useful to specify the magnitude of P(X e V). For each non- 
trivial hyperplane V, define the discrete codimension d{V) of V to be the unique 
integer multiple of 1/n such that 

d(V) 1 d(V) 

N- — -^ < P(X eV)<N — —. (8) 
Thus d{V) is large when V contains few elements from {—1, 1}", and conversely. 

We define by ^Id the set of all non-degenerate, non-trivial hyperplanes with discrete 
codimension d. It is simple to sec that 1 < d{V) < n for all non-trivial V. In 
particular, there are at most O(n^) = N°^'^^ possible values of d, so to prove our 
theorem it suffices to prove that 



^ P(Xi, . . . , X„ span V) < iV-^+"(i) (9) 

for all 1 < d < n. (Our errors o(l) shall decay to zero as n — > 00 uniformly in the 
choice of d.) 

We first handle the (simpler) case when d is large. Note that ii Xi, . . . , X„ span 
V, then some subset of n — 1 vectors already spans V. By symmetry, we have 



^ P(Xi, . . . ,X„ span V)<nY^ P(Xi, . . . ,X„_i span y)P(X„ e V) 

<nN-^ P(Xi,... ,X„_i span \/) 



This disposes of the case when d> {e — o(l))n. Thus to prove Theorem 1.5 it will 
now suffice to prove 



The above estimates were extremely crude. In fact, as shown in [6], one can replace log log n 
with a quantity as high as n — 3 log2 n and still achieve the same result. 
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Lemma 5.4. If d is any integer multiple ofl/n such that 

1 < d < {e - o{l}}n (10) 

then we have 

P{Xi,... ,Xn span V) < iV-^+°(i). 

vena 

This is the objective of the next section. 



6. Proof of Lemma 5.4 

The key idea in [6] is to find a new kind of random vectors which are more concen- 
trated on hypcrplancs in ild (with small d) than (±1) vectors. Roughly speaking, 
if we can find a random vector Y such that for any V € fl^ 

P(x ev)< cP(Y e V) 

for some < c < 1, then, intuitively, one may expect that 

P(Xi, . . . , X„ span V) < c"P(yi, ...,¥„ span V) (11) 

where Xi and Yi are independent samples ofX and Y, respectively. Since Fi, . . . , y„ 
can only span at most one hyperplane V, one can then hope to conclude a bound 
of 0(c") for the probability that Xi, . . . , X„ span a hyperplane. 

While (11) may be too optimistic (because the samples of F on may be too lin- 
early dependent), it has turned out that something little bit weaker can be obtained, 
with a proper definition of Y. We next present this important definition. 

Definition 6.1. For any < /x < 1, let r]^'^^ € {—1,0,1} be a random variable 
which takes -|-1 or —1 with probabilities ^, and with probability 1 — /it. Let 

Xf**) e {-1, 0, 1}" be a random variable of the form X^**) = {r][^\ ... , rjit^), where 
the rjj'^^ are iid random variables with the same distribution as rj^'^^ . 

Thus has the same distribution as X, while X^"^ is concentrated purely at 
the origin. The other random variables X^^^^ have an intermediate behavior. We 
shall work with X^**) for fi := 1/16; this is not the optimal value of (i but is the 
cleanest to work with. For this value of /i we have the crucial inequality, following 

a Fourier-analytic argument of Halasz [5] (sec also [6]). 

Lemma 6.2. Let V be a non- degenerate non-trivial hyperplane. Then we have 

P(^ e y) < (i + o(i))P(x(i/i6) e V). 
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This lemma can bo viewed as an assertion that any subspace which contains many 
points from {—1, 1}", must necessarily contain several further points from {—1, 0, 1}' 
(weighted appropriately). We will prove this lemma in the next section. 

Remark 6.3. One can obtain similar results for smaller values of fi than 1/16; for 
instance this was achieved in [6] for the value yit := igge"^^^"^) eventually resulting 
in their final gain s := .001 in Theorem 1.4. However the smaller one makes /U, 
the smaller the final bound on e; indeed, most of the improvement in our bounds 
over those in [6] comes from increasing the value of /x. One can increase the 1/16 
parameter somewhat at the expense of worsening the | factor; in fact one can 
increase 1/16 all the way to 1/4 but at the cost of replacing 1/2 with 1. This shows 
that (3/4 + o(l))" is the limit of our method. We have actually been able to attain 
this limit; see [11]. 

Let V be a hypcrplane in for some d obeying the bound in Lemma 5.4. Let 7 
denote the quantity 

d 

^~ nlog2 16/15' ^ ' 

note from (2) and (10) that < 7 < 1. Let e' := min(e, 7). 

Consider the event that the i.i.d random vectors Xi,. . . , X[, . . . , X'^^_^,-^^ 

are linearly independent in V (we omit the rounding which plays no significant role). 
One can lower bound the probability of this event by the probability that all Xi 
and all X'j belong to V, which is 

P{X G y)(i-'^')" = N-i'^-s')d-oii) _ 

Let us replace Xj by X^^/^^^ for 1 < i < (1 — 7)n and consider the event Ay 
that x[^^^^\ . . . , xjj^^J'^^^_^, X[, . . . , arc linearly independent in V. Using 
Lemma 6.2, we are able to give a much better lower bound for this event: 



P{Av) > Ar(i-T)-(i-=')d-o(i). (13) 

The critical gain is the term N^^~'^\ In a sense, this gain is expected since X^^^^^^ 
is much more concentrated on V then X. We will prove (13) at the end of the 
section. Let us now use it to conclude the proof^ of Lemma 5.4. 

Fix V e Orf. Let us denote by By the event that Xi, . . . , Xn span V. To prove 
Lemma 5.4, we thus need to show 



^The argument below is a simplified version of a hypergraph covering argument used in [6]. 
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The idea is to use (13) to ''replace" some of the Xi, . . . ,Xn with the random 
variables xi^^^^\... ,xfy^9 , which are more concentrated on V, in order to 

1 ' ' (1— 7)n' ' 

obtain an exponential type gain. Since Ay and By are independent, we have, 
by (13) that 

P(B^) = ^^p^^^^^^^ < 7V-(i-'')+(i-^')''+''(i)p(Ay A By). 
Consider a set 

y(l/16) y(l/16) Y' Y' Y Y 

^1 i^(l-7)n'"^l' • • • )^(7-£')n'"^l' • • • 

of vectors satisfying Ay A By . Then there exists e'n — 1 vectors Xj-^ , . . . , Xj^,^_^ in- 
side Xi,... ,X„ which, together with x[^^^^\ ... , X^l^^^^^, X[, . . . , X'^^_^,^^, span 
V. Since the number of possible indices (ji, . . . ,js'n-i) is = A/'''^^ by 

conceding a factor of N^(^'^~^°^^'> , we can assume that ji = i for all relevant i. Let Cy 
be the event that x['/'^\ x[l['^l, X[,..., X[^_^,^^, Xi, . . . , X.^^-i span V. 
Then we have 

P{By) < 7V-(l-T)+(l-^')''+''(^')+o(l)p(^Cy A {X,,n, . • . ,X„ in F)). 

On the other hand, Cy and the event {X^'n, ■ ■ ■ ,Xn in V) are independent, so 

p(Cy A (Xe.„, . . . ,x„ in F)) = P(Cy)P(x e y)(i-^')"+i. 

Putting the last two estimates together we obtain 

P(-By) < A/-(i-'^)+(^-^')'*+'*(^')+°(i)A/-((i-^')"+^)<*/"P(Cy) 
= Ar-(i-T)+''(e')-e+o(i)p(c^). 

Since any set of vectors can only span a single space V, we have J2ye(id ^i^^) — ^■ 
Thus, by summing over fi^, we have 



P{By) < 7V-(i-^)+''(^')-^+°(i). 

We can rewrite the right hand side using (12) as n'^^'^ )+~(iog2 i6/i5~i)~i+°(i)^ Since 
iog2 16/15 — 1 > 0, d/n < e, and h is monotone in the interval 0<£'<£<l/2we 
obtain 
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^ P(-By) < jV^(^)+^( iog2 ie/is . 

and the claim follows from the definition of e in (2). □ 

In the rest of this section, we prove (13). The proof of Lemma 6.2, which uses 
entirely different arguments (based on Fourier analysis), will be presented in the 
next section. 

To prove (13), first notice that the right hand side is the probability of the event Ay 
that . . . ,xll/_^^l,Xi, . . . belong to V. Thus, by Bayes' identity 

it is sufficient to show that 

I>{Av\A'y) = N''^^\ 

This is an estimate similar to that in Lemma 5.1, and we prove it by similar argu- 
ments. From (8) we have 

P{X eV) = {l + 0{l/n)}2-'^ (14) 

and hence by Lemma 6.2 

P(X(Vi6) £ > (2 + 0(l/n))2-<^. (15) 
On the other hand, by a trivial modification of the proof of Lemma 2.1 we have 

p(X(i/i6) g < (i5/ig)n-dim(iy) 

for any subspace W. By Bayes' identity we thus have the conditional probability 
bound 

P(X(V16) g eV)< (2 + 0(l/n))2''(15/16)"-'^^°^(^\ 

This is non-trivial when dim(W^) < (1 — 7)n thanks to (12). 

Let Ek be the event that x[^^^^\ ... , xj^^^^^^ are independent. The above estimates 
imply that 

P{Ek+i\Ek A A'y) > 1 - (2 + 0(l/n))2'^(15/16)"-'=. 

for all < A; < (1 — 7)n. Applying Bayes' identity repeatedly (and (12)) we thus 
obtain 

P(E(i_^)„|AV) > N-^^'\ 

If 7 < e then we are now done, so suppose 7 > e (so that e' = e). From Lemma 
2.1 we have 

P(X gW)< (i/2)"-dim(M^) 
for any subspace W, and hence by (14) 

P(X e W\X e y) < (1 + 0(l/n))2''(l/2)"-^i™W. 
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Let US assume i?(i_^„) and denote by W the (1 — 7n)-dimensional subspace spanned 

^(1/16) 
'(1-7)^ 



by x[^^^^\ ... , . Let Uh denote the event that X{,. .. , X'j^, W are independent. 



We have 



Pk = V{Uk+i\Uk A A'y) > 1 - (1 + 0(l/n))2''(l/2)"-'=-(i-T)" > 1 - ^2('=+=-^)^ 



for all < fc < (7 — e)n, thanks to (10). Thus by Bayes' identity we obtain 



0<fe<(7-e)n 

as desired. □ 



7. Halasz-type arguments 

Wc now prove Lemma 6.2. The first step is to use Fourier analysis as in [5] to obtain 
usable formulae for V{X G V) and P(X('^) e V). Let v G Z"\{0} be an normal 
vector to V with integer coefficients (such a vector exists since V is spanned by the 
integer points V f\{—l, 1}"). By hypothesis, at least log log n of the coordinates of 
V are non-zero. 

We first observe that the probability 'P{X'^'^^ G V) can be computed using the 
Fourier transform: 

P(X('') G y) = P(X('') • w = 0) 

= E(/'e2-«^"''-rf^) 
Jo 

Jo 

»1 n 

= / l[{{l-n) + HC0s{2Tr^Vj)) d^. 
Jo ,=1 



Applying this with ;u = 1/16 we obtain 

P(X(Vi6) GV) = I^' n(g + lcos(27r^^,)) d^. 
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Applying instead with = 1, we obtain 

P(X €V)= TT cos(27r^Uj) 

JO .1 



^0 

< 



f _ _ 

/ []|cos(27rCt;j)| d€ 

/ TT |cos(7r^t;j)| d^, 

^0 ,=1 



where the latter identity follows from the change of variables ^ i— > ^/2 and noting 
that I cos(7r^Uj)| is still well-defined for ^ e [0, 1]. Thus if we set 

n n 

F{0 n I cos(7r?i;,)|; G{0 11(1^ + cos(2<z;,)), (16) 
j=i j=i 
it will now suffice to show that 

£ nOd^<il+o{l)) I^Gi^)d^- (17) 
We now observe three estimates on F and G. 

Lemma 7.1. For any G [0, 1], we have the pointwise estimates 

FiO < (18) 

and 

F{OF{a<G{^+ef (19) 

and the crude integral estimate 



f G(0 di < 0(1). 
Jo 



(20) 



Of course, all operations on ^ and ^' such as (^ + ^') in (19) are considered modulo 
1. 



Proof of Lemma 7.1. We first prove (18). Prom (16) it will suffice to prove the 
pointwise inequality 

Icos^l < [^ + ^cos20]4 

for all 6* e R. Writing cos 20 = 1 - 2a; for some < a; < 1, then | cos e\ = {l- xf/"^ 
and the inequality becomes 

{l-xf^ < {l-x/8)\ 
Introducing the function f{x) := \og{Yz^), this inequality is equivalent to 

fix) - m ^ fjx/s) - /(o) 

x-0 - x/8-0 
but this is immediate from the convexity of /. 
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Now we prove (19). It suffices to prove that 

|cos^?||cos^?'|<[^ + ^cos(2(^? + 0'))]' 

for all 9, 9' € R. As this inequaUty is periodic with period tt in both 9 and 9' we 
may assume that |^|, |^'| < ■7r/2 (the cases when 9 = n/2 ov 9' = -k/2 being trivial). 
Next we observe from the concavity of logcos(^) in the interval (— 7r/2,7r/2) that 

9 + 6' 1 1 
cos (9 cos (9' < cos^ 2 ^^''^ ^ 

Writing cos(6' + 6*') = l-2a;for some < a; < 1, then cos2(^ + ^') = 2(l-2a;)2-l = 
1 — 8a; + 8a;^, and our task is now to show that 

1 - a; < (1 - (a; - x'^)/2f = 1 - a; + a;^ + (a; - x'^f/A, 

but this is clearly true. 

Now we prove (20). We know that at least log log n of the vj are non-zero; without 
loss of generality we may assume that it is fi, . . . , vk which are non-zero for some 
K > log log n. Then we have by Holder's inequality, followed by a rescaling by vj 

G(0 < n(^ + ^ cos(27rC^;,)) rfC 

; log n 

< TT f/ f— + — COsf27rfl),)y°slogn ^^^l/loglogr^ 



n 7/ 16 + 16 



= 0(1) 

as desired, since K > log log n. □ 

Now we can quickly conclude the proof of (17). From (19) we have the sumset 
inclusion 

e [0, 1] : F(0 > a} + e [0, 1] : F(0 > a} C e [0, 1] : G(0 > a} 

for any a > 0. Taking measures of both sides and applying the Mann-Kneser- 
Macbeath "a + (3 inequality" |A + S| > min(|A| + 1) (sec [9]), we obtain 

min(2|{e e [0, 1] : F{0 > a}], 1) < \{( € [0, 1] : G{0 > 

But from (20) we see that |{^ G [0, 1] : G(^) > a}\ is strictly less than 1 if a > o(l). 
Thus we conclude that 

|{? G [0, 1] : F{0 > a}\ < ^|{? G [0, 1] : G{0 > a}\ 

when a > o(l). Integrating this in a, we obtain 



i[0,l]:F(^)>o(l) ^ Jo 
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On the Other hand, from (18) we see that when f (4) < o(l), theni^(^) = o{F{^y/'^) < 
G{^), and thus 



/ F{^) < o{l) [' G d^. 

J\o.i]:F(f)<o(i) Ja 



'[O,l]:-F(0<o(l) 

Adding these two inequahties we obtain (17) as desired. This proves Lemma 6.2. 
□ 

Remark 7.2. A similar Fourier-analytic argument can be used to prove Lemma 3.4. 
To see this, we first recall Esseen's concentration inequality [4] 



>(X e /) < C / |E(e^*^)| 

J\t\<i 



dt 

\t\<i' 

for any random variable X and any interval / of length at most 1. Thus to prove 
Lemma 3.4 it would suffice to show that 

/ |E(exp(ii Vaj-ej)!) dt = 0{1/Vk). 
J\t\<-^ 

But by the independence of the cj, we have 

k k k 

|E(exp(ii^a,e,))| = |E(e^*«^-OI = I H 
and hence by Holder's inequality 

|E(exp(zt ^ a,e,))| dt<Y[{[ \ cos{tajt dtf'K 

|t|<i 7^ j=i J\t\<i 

But since each cij has magnitude at least 1, it is easy to check that /|(|<]^ | cos(taj) j*^ dt = 
0(1/VA;), and the claim follows. 



8. Extensions and Refinements 



8.1. Singularity of more general random matrices. In [8], Komlos extended 

Theorem 1.2 by showing that the singularity probability is still o(l) for a random 
matrix whose entries are i.i.d. random variables with non-degenerate distribution. 
By slightly modifying our proof of Theorem 1.2, we are able to prove a different 
extension. 

We say that a random variable ^ has (c, p)-property if 



min{P(e>c),P(5<-c)}>p. 



Let S^ij, 1 < i,i < n be independent random variables. Assume that there are 

positive constants c and p (not depending on n) such that for all 1 < i,j < 

has (c, p)-property. The new feature here is that we do not require be identical. 
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Theorem 8.2. Let ^ij, 1 < i, j < n he as above. Let M„ be the random matrix 
with entries ^ij . Then 

P(detM„ = 0) = o(l). 

We only sketch the proof, which foUows the proof of Theorem 1.2 very closely 
and uses the same notation: Xi, . . . , are the row vectors of M„ and Wj is the 
subspace spanned hy Xi, . . . ,Xj. We will show 

n-l 

Y,P{Xj+ieWj) = oil). (21) 

j=l 

This estimate is a consequence of the following two lemmas, which are generalization 
of Lemmas 2.1 and 3.1. 

Lemma 8.3. Let W be a k dimensional subspace ofM". Then for any 1 < j < n 

P{Xj eW)<{i- 

Lemma 8.4. For any n/2 < j <n 

P{Xj e Wj-i) = 0{1/Vh^). 

The proof of Lemma 8.3 is the same as that of Lemma 2.1. The only information 
we need is that for any fixed number x and any plausible i,j, P{^ij = x) < 1 — p. 

To prove Lemma 8.4, let us consider the case j = n (the proof is the same for other 
cases). We need to modify the definition of universality as follows. 

We call a subset V of n-dimensional vectors k-universal if for any set of k indices 

I < ii < i2 < ■ ■ ■ < ik ^ n and any sign sequence ei, . . . , e/j, one can find a vector 
V eV, such that the ij coordinate of v has sign ej and absolute value at least c. 

In what follows, we set I = Inn/ 10. We first show Xi,. . . , X„ is very likely to be 
Z-universal. (Notice that the Xj have different distribution.) 

Lemma 8.5. With probability 1 — o{l/n), X\, . . . ,X„ is l-universal. 

Proof of Lemma 8.5. Fix a set of indices and a sequence of signs. For any 
1 < i < the probability that Xj fails is at most 1 — p'. The rest of the proof is 
the same. □ 



It follows that 
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Corollary 8.6. Let H he a subspace spanned by n — 1 random vectors. Then 
with probability 1 — o(l/n), any unit vector perpendicular to H has at least I + 1 
coordinates whose absolute values are at least where K is a constant depending 
on c. 

The last ingredient is the following generalization of Lemma 3.4. 

Lemma 8.7. Let ai, . . . , be real numbers with absolute values larger than one 
and ei, . . . , Cfe be independent random variables satisfying the (c, p) -property. Then 
for any interval I of length one 

k 

P(^aieie/)=0(1/V^). 

i=l 

Theorem 8.2 follows from Corollary 8.6 and Lemma 8.7. To conclude, let us remark 
that statements more accurate than Lemma 8.7 are known (see e.g. [5]). However, 
this lemma can be proved using an argument similar to the one in Remark 7.2. 

8.8. Determinants of more general random matrices. Let , 1 < «,j < n, 

be a set of independent (but not necessarily i.i.d.) r.v's with the following two 
properties: 

• Each has mean zero and variance one. 

• There is a constant K that < K with probability one. 

These two properties imply the following property 

• There are constants 5 > and 5' > Q such that for any interval / of length 

2(5, P(^y &L)<l-6' for all 1 < z, j < n. 

Theorem 8.9. Consider the random matrix Mn with entries ^ij as above. Let e 
be an arbitrary positive constant. With probability 1 — o(l), 

|detM„| > vn!exp(-ni/2+^). 

Notice that Lemma 2.1 holds for this model of random matrices, since the last 
property of implies that has (c, p) property. 

Next, consider Lemma 2.2. Consider a row vector, say, X = {^n, . . . and a 
fixed subspace W of dimension d. Again, we have (with the same notation as in 
Section 2) 



n n 

j=i k=i 
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However, it is no longer the case that the last formula equals 



j=l k=l 

since are not Bernoulli random variables. On the other hand, we can have 
something similar with an extra error term. It is easy to show, using ChernofF's 
bound, that 



|X|^=^4.>n-^nVMnn 
holds with probability at least 1 — l/2n^, for some sufficiently large C. Similarly, 

^&jP3j <d - -n^/^lnn 

holds with probability at least 1 — l/2n^. (The use of Chernoff's bound requires 
of random variables be bounded. One can of course, use some other method to 
remove this assumption.) 

The probability 1/n^ is negligible. Moreover, we can apply Talagrand's inequality 
the same way as before. However, because of the new error term Cn^/^ Inn, we 
cannot set do = n — In^^^ n, but have to stop at n — Cn^^^ Inn. In order to handle 
the cases when In^^^ n < n — d < C'n}/'^ In n, we need the following lemma, due to 
Bourgain (private conversation), which can be seen as an extension of Lemma 2.1. 

Lemma 8.10. There are constants a>0,l>5>0 such that the following holds. 
Let W be a fixed subspace of dimension d < n — 1 and X a random (row) vector. 
Then 



P(dist(X,W^) < -^) < 

Proof of Lemma 8.10. We construct unit vectors Z^,. . . , Zn-d (not necessarily 

orthogonal) in the orthogonal complement W'^ of W as follows. We let Zi be an 
arbitrary unit vector in W-^ ; since Zi has unit length, at least one of its coordinates 
has magnitude at least l/\/n. Without loss of generality we may assume that it is 
the first coordinate {Zi, ei) which has magnitude at least 1/^/n- Now wc let Z2 be 
an arbitrary unit vector in W-^ fl ej- (which has dimension at least n — d— 1); then 
Z2 is orthogonal to ei and has a coordinate of magnitude at least Without 
loss of generality we may take | {Z2 , 62) | > 1/ \/n. Continuing in this fashion, wc can 
(without loss of generality) find Zi,. . . , Zn-d & W-^ such that each Zj is orthogonal 
to ei, . . . , Bj-i and is such that \{Zj, ej)\ > l/^/n. 
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Now suppose that X = (ei,... ,e„) is such that dist{X,W) < where a is a 
sufBciently small positive constant. Fix the last d coordinates e„_d+i, ... ,e„ and 
let T denote the set of all vectors X with these fixed coordinates satisfying 

dist(X,W) < 

Fix a vector Xq = {gi,. . . ,gn) in T. It is easy to show that for any vector X = 
{g[, . . . ,g'^) e T, \gl — gi\ < 2a, for all 1 < i < n — d. On the other hand, if a is 
sufficiently small, then by the third property of the there is a positive constant 
6 < 1 such that the set of g^ where \gl — gi\ < 2a has measure at most b for all 
1 < i < n — d. This proves the claim. □ 

The rest of the proof is basically the same, with some minor and natural mod- 
ification in the calculation. The error term obtained from Lemma 8.10 (in the 
determinant) is only 

n-0("''''"") =exp(-o(nV2+^)) 

for any fixed e > 0. 

In certain situations, we do not have the assumption that \ arc boimdcd from 
above by a constant. We are going to consider the following model. Let , 1 < 
i,j < n be i.i.d. random variables with mean zero and variance one. Assume 
furthermore that their fourth moment is finite. Consider the random matrix M„ 
with as its entries. 

By using Lemmas 2.1 and 3.1 and replacing Lemma 2.2 by a result of Bai and 
Yin [1], which asserts that the volume of the (1 — 7)n-dimensional parallelepiped 
spanned by the first (1 — 7)71 row vectors is at least n^'^/^~^/^~''^^))" with probability 
1 — 0(1) for any fixed 7 > 0, we can prove 

Theorem 8.11. We have, with probability 1 — o(l), that 

|detM„| >n(V2-o(i))". 
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